3,210 research outputs found

    Implicit Gradient Regularization

    Full text link
    Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases gradient descent toward flat minima, where test errors are small and solutions are robust to noisy parameter perturbations. Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly. More broadly, our work indicates that backward error analysis is a useful theoretical approach to the perennial question of how learning rate, model size, and parameter regularization interact to determine the properties of overparameterized models optimized with gradient descent

    Time Course of Altered Sensitivity to Inhibitory and Excitatory Agonist Responses in the Longitudinal Muscle–Myenteric Plexus and Analgesia in the Guinea Pig after Chronic Morphine Treatment

    Get PDF
    Tolerance that develops after chronic morphine exposure has been proposed to be an adaptive response that develops and decays over a defined time course. The present study examined the development of tolerance to the acute hypothermic and analgesic effects of morphine and correlated the time course for the desensitization in vivo with the reduced responsiveness to DAMGO and 2-CADO and increased responsiveness to nicotine of the longitudinal muscle/myenteric plexus (LM/MP) preparation in vitro. Assessment was performed at various times after morphine or placebo pellet implantation. Morphine produced a modest hypothermic response to which no tolerance developed. However, the development of tolerance to the analgesic effect of morphine, the inhibitory effect of DAMGO and CADO on neurogenic twitches of the LM/MP and hypersensitivity to the contractile response to nicotine was observed to occur in a time-dependent manner. The alterations in sensitivity to DAMGO, nicotine, and responsiveness to morphine analgesia occurred between days 4 and 10 and returned to normal by day 14 post-implantation. In contrast, sensitivity of LM/MP preparations to 2-CADO displayed a similar time-dependent onset but the tolerance persisted beyond 14 days after implantation. These data suggest that the heterologous tolerance that develops after chronic morphine treatment is time-dependent and persistent but, ultimately returns to normal in the absence of any intervention. Furthermore, the data suggest that the basis of the adaptive phenomenon may involve multiple cellular mechanisms including the modulation of cell excitability and normal physiology but the consequences of the adaptation extend to all effects of the agonist

    Distinct Quantum States Can Be Compatible with a Single State of Reality

    Get PDF
    Perhaps the quantum state represents information about reality, and not reality directly. Wave function collapse is then possibly no more mysterious than a Bayesian update of a probability distribution given new data. We consider models for quantum systems with measurement outcomes determined by an underlying physical state of the system but where several quantum states are consistent with a single underlying state---i.e., probability distributions for distinct quantum states overlap. Significantly, we demonstrate by example that additional assumptions are always necessary to rule out such a model.Comment: 5 pages, 2 figure

    Why neural networks find simple solutions: the many regularizers of geometric complexity

    Full text link
    In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.Comment: Accepted as a NeurIPS 2022 pape
    corecore